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A scheme of evaluating an impact of a given scientific paper based on importance of papers 
quoting it is investigated. Introducing a weight of a given citation, dependent on the previous 
scientific achievements of the author of the citing paper, we define the weighting factor of a given 
scientist. Technically the weighting factors are defined by the components of the normalized leading 
eigenvector of the matrix describing the citation graph. The weighting factor of a given scientist, 
reflecting the scientific output of other researchers quoting his work, allows us to define weighted 
number of citation of a given paper, weighted impact factor of a journal and weighted Hirsch index 
of an individual scientist or of an entire scientific institution. 
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I. INTRODUCTION 

Any given scientist is a good scientist if he is considered 
to be good by a representative group of other good scien- 
tists. Such a simple way of evaluating quality of scientific 
achievements could be useful two hundred years ago, as 
the number of scientists was small and a respectable re- 
searcher was competent to evaluate the progress in a huge 
field of science. 

Nowadays such an approach is no longer realistic. As 
the number of universities, scientists, journals and sci- 
entific articles keeps growing fast, one is often forced to 
use some parametric measures to characterize the out- 
put of a given scientist. Although the peer review is still 
considered to be the most reliable and objective method 
of evaluation of candidates for any scientific position, in 
view of a large number of applicants in the preliminary 
phase one often performs screening of numerical values of 
performance indices, designed to quantify scientific out- 
put of the candidates. 

As the status of scientific citations among researchers 
is rather ambivalent [![, we do not claim that citations 
of scientific articles directly indicate their quality and 
importance. Just on contrary, we share some doubts, 
often raised in the literature [2|, H| that trying to mea- 
sure scientific achievements by any index based on the 
number of citations may generate certain perverse effects: 
researchers no longer focus on interesting and relevant re- 
search, but they try to adapt to the popular evaluation 
criteria. However, looking around we have to agree that 
various citation indices are used nowdays to evaluate sci- 
entists, journals or research institutions. 

Thus in this work we shall not discuss a controversial 
issue, what is the optimal measure of scientific achieve- 
ment. Instead we review common quantitative measures 
of scientific quality and discuss possible ways to improve 
them. The most popular indices used to evaluate the im- 
pact of a given article, the influence of a scientific journal 
for the research community, the scientific output of a sin- 
gle individual or entire institution are based only on the 
quantity of citations in the literature to the articles ana- 
lyzed. Our aim is to take into account also the quality of 



the citations, measured by the averaged achievements of 
the authors of the papers which refer to the article under 
consideration. 

Additional motivation for our research is due to the 
controversy concerning the usage of the impact factor 
(IF) [HIH, to quantify the quality of a scientific journal. 
On one hand it was pointed out [6( that impact factor of 
a given journal can be manipulated by the editors and 
publisher. On the other hand it was often emphasized 
that the two year window for counting the citations of 
the papers analyzed is perhaps fine for biology, medicine 
and some other branches of science, it is rather not the 
case e.g. for mathematical journals. These journals score 
small values of the impact factor, since the preparation 
of a mathematical article and the entire refereeing proce- 
dure takes often more time than two years. Furthermore, 
in several branches of science great role is played by arti- 
cles which are not quickly forgotten. Thus one could also 
design and work with an impact factor, which takes into 
account only citations gathered three or five years after 
paper was published 0. 

In order to identify the papers which contribute most 
to the IF of a journal the editors often try to identify 
articles, which gained the largest number of citations in 
the first and the second year after the year they were 
published. Editors of some mathematically oriented jour- 
nals, analyzing a list of articles prepared in this way for 
their journal were concerned, if it really represents the 
most important articles published. In fact they consid- 
ered that several papers of not the top quality entered 
this list, only because they were simple enough that they 
could become understood and later quoted by other au- 
thors of recent papers of a mediocre quality. 

On the other hand, in is also believed that in several 
fields of science the impact factor could be artificially in- 
flated by a number of papers of lesser quality, the authors 
of which tend to quote several recent articles not directly 
related to their work. The aim of this practice is to please 
the editor (if the paper cited was published in the same 
journal), or to suggest the referees that the author fol- 
lows the recent literature and in this way to improve the 
chances that their work will be published. 
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To take these features into account one needs then to 
distinguish, in a statistical sense, the quality of a given ci- 
tation. In short, any citation of an established scientist, 
whose numerous papers have already attracted several 
citations, should be weighted more than a citation by a 
newcomer to the field. In this paper we suggest a pos- 
sible solution of this problem by defining the weight of 
a scientific citation and using this notion to modify and 
improve performance indices defined earlier in d, 0, @] ■ 

All indices proposed are based on the weighting factor, 
associated to each scientists, which is analogous to the 
Page Rank introduced by Brin and Page Q to character- 
ize relative importance of various web pages and used in 
the Google web search engine. The weights defined by the 
components of the leading eigenvector of a suitably de- 
fined citation matrix characterize a given citation. These 
numbers display the desired property of self-consistence: 
the weight of any citation by a given researcher is larger, 
if his papers are quoted by other scientists, whose papers 
are often quoted. 

A similar idea was recently applied for study of the 
citation graph created for publications in the Physical 
Review family of journals [T(|, for the graph in the field 
of biochemistry and molecular biology ll| , and indepen- 
dently put forward in recent lectures of Nielsen 12[ , who 
considered the idea to use the Page Rank algorithm to 
order individual scientific papers according to their cita- 
tion graphs. The same algorithm was used to design the 
Eigenfactor web tool, which takes into account the cita- 
tion graph to evaluate a proposed measure of the relative 
importance of scientific journals [l3[ . While any node of 
the graph represents a single article in the former ap- 
proach or an entire journal in the latter scheme, in this 
work it will be associated with an individual scientist. 



II. CITATION MATRIX AND WEIGHTING 
FACTORS 

Consider a sample of N authors of numerous scien- 
tific articles, in which they usually refer to their previous 
achievements, but also quote papers of some other scien- 
tists. Let us assume for a while that all papers consid- 
ered are written by a single author only - this simplifying 
assumption will be relaxed later in this section. Each sci- 
entist can be thus associated with a vertex of a graph, 
while any citation in any paper forms a directed link be- 
tween two vertexes - see e.g. fl4| . Define a square matrix 
C of size N, such that 

a) Cij is equal to the number of times the scientist "j" 
quoted a single paper of his colleague " i" . 

b) Cu = 0, for i = 1, . . . , N, hence all self-citations are 
neglected. 

Observe that this citation matrix C is likely not to 
be symmetric. However, matrix C is by construction 
real and it contains non-negative entries only. Therefore 
it fulfills the assumptions of the celebrated Frobcnius- 
Perron (FP) theorem (see e.g. [HI, [l6| ). This implies 



that 

i) there exists an eigenvalue Z\ = A with the largest 
absolute value which is real and non-negative, 

ii) the entire spectrum {zi}f =1 of C belongs to the disk 
of radius equal to A, 

iii) the eigenspace associated with A contains a real 
cigenstate x = {x\, . . . xjv}, such that all its components 
are non-negative. 

Since we do not force citation matrix C to be stochas- 
tic, the leading eigenvalue A needs not to be equal to 
unity. However, we will assume here that the graph an- 
alyzed is connected. Then the leading eigenvalue A is 
non-degenerate, and there exists a unique vector x such 
that 

Cx = Xx. (1) 
This leading eigenvector can be normalized as 
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N 



L,j=l X 3 



(2) 



which implies that the mean entry is equal to unity, 
(Wi)i = 1. 

In this way for a given scientist i one can associate an 
weighting factor W{. Such a factor depends not only on 
total number of times his papers were quoted by other 
scientists, Ti := Ylj=i Cij, but also on the fact who re- 
ferred to his work. However, such a number should not 
be treated as an optimal number used to quantify the sci- 
entific achievement of a researcher. It is more informative 
then the bare number Ti of total citations, but it shares 
similar disadvantages. For instance, as emphasized by 
Hirsch Q, the total number of citations can be inflated 
by single non representative papers, and it overweights 
highly quoted review articles versus original research pa- 
pers. On the other hand the weighting factors Wi allow 
us to define other more suitable indices and parameters. 
Before proceeding we need to adjust the definition of C 
and its eigenvector x, to make it directly applicable to 
the problem. 



A. Scientific papers with several authors 

Any citation to a single author paper published in 
an article written by another single author can be in- 
terpreted as a unit flow between the corresponding two 
vertexes. Thus the number 



(3) 



represents the number of unit links in the graph, equal 
to the total number of citations (with auto citations ex- 
cluded). 

In practice, the papers are often written by several 
authors, so it is natural to split this coupling uniformly 
among all the authors involved in such a way that the 
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sum of the weights for each citation is equal to unity. 
To this end, consider the process of forming a graph by 
taking into account one article after another, and in each 
case scanning through all its references. 

Assume that a paper with M authors defined by the 
set of indices, J = {ji, . . . , j'a/} quotes another paper by 
K authors, described by the set / = {ii, . . . Any 
quotation contributes to the citation matrix according to 
two rules: 

a') if If] J = {0} then dj — s- Cy + -^f f° r au pairs of 
indices i,j such that i £ I and j 6 J. In words, an inde- 
pendent citation is taken into account and normalized in 
such a way that the number L defined by grows by 
one. 

b') if In J 7^ {0} then C — > C; the citation matrix does 
not change since one does not want to analyze dependent 
citations, 

Observe that in the case of all papers written by single 
author the rules a') and b') reduce to the rules a) and b) 
discussed before. 

Let us emphasize that the assumptions a) and a') imply 
that all quantities considered further do not depend on 
self citations and dependent citations, which are known 
to influence bibliometric indices |17H2l| . Alternatively, 
one could neglect all auto citations but take into account 
the dependent citations and attribute to them the weight 
smaller than this characterizing independent citations. 

B. Truncated citation graph 

In practice it is hardly doable to take into account all 
the scientist into the consideration. Even if one could 
aim to make the graph as complete as possible, its trun- 
cation at some stage seems inevitable. In any realistic 
case the citation graph will describe a finite set of N re- 
searchers and take into account a given number of their 
papers and the cumulative list of all the references. This 
list of citations will likely include references to the pa- 
pers written by authors not belonging to the analyzed 
set of scientists. To take into account the fact that pa- 
pers written by the researcher represented by i-th vertex 
of the graph are cited by authors outside the graph we 
suggest to extend the citation matrix by an extra row 
and extra column, which jointly represents all truncated 
vertexes. The additional entries read, 

c) Ci_N+i is equal to the total number of times, the 
papers of scientist " i" were quoted by all authors outside 
the graph (not belonging to the analyzed set), 

d) CV+m := jf SfcLi C k i for i = 1, . . . , N. This as- 
sumption is made to attribute a well balanced, average 
weight to all external citations. To be consistent with 
the rule b) we will also set 

c) Cn+i,n+i = 0. 

Note that the last, fictitious, vertex of the graph has no 
direct meaning, since it only represents the world outside 
the graph. The eigenvector x of the augmented matrix 
has N + 1 components, but only first N of them have 



the meaning of the weighting factors for N individuals. 
Therefore its last component xn+i can be neglected, and 
in the normalization scheme which eventually produces 
the vector W of size N the same rescaling ((2|) can be 
used. 



III. WEIGHTED PERFORMANCE INDICES 

A. Weighted number of citations 

After constructing the complete citation matrix C or 
its approximation obtained according to the rules speci- 
fied above, we find its normalized leading vector and nor- 
malize it as in (J5J) to obtain the vector of the weighting 
factors Wi,i = 1, . . . , N. Spectra and leading eigenvec- 
tors of some exemplary graph matrices are discussed in 
Appendix A, while the issue of uniqueness of the vector 
corresponding to the leading eigenvalue is discussed in 
Appendix B. For any scientific article A one may find 
in an appropriate date base the number of times it was 
quoted by all other scientific papers in the literature. De- 
noting this number by c(A) we are now in position to 
define the weighted number of quotations, 

c(A) 

w(A):=Y, W i> ( 4 ) 

where Wj denotes the weighting factor of the j-th au- 
thor quoting the paper A. For consistency we are not 
going to include into this sum any auto citations. In a 
more general case of papers written by several authors 
it is natural to take the average weight of these authors. 
Therefore we write 

c(A) rij 

w W ■= E - E w ^ > © 

3=1 3 ,u=l 

where c(A) represents the number of independent quo- 
tations of the paper A, while rij denotes the number of 
authors of the j-th paper quoting A, and is the 

weighting index of the /x-th co-author of this paper. 



B. Weighted impact factor of a journal 

Let Z y denotes the number of papers published by a 
certain scientific journal J in year y. To quantify the 
impact the journal exerts for the scientific community one 
often uses the so called impact factor. To compute it one 
takes all Z y - 2 + Zy-\ articles published one or two years 
earlier, and then sums the number of citations c(Aj), a 
given article from this set received during the year y. The 
result has then to be normalized with respect to the total 
number of articles published in journal J during the two 



4 



year time span [H, H| , 



IF2 y (J) := 



Zy — 2-\~Zy-l 



E c (^) • ( fi ) 



This commonly used index takes into account the two 
year time window, so we shall denote it by IF2. 

By construction this quantity takes into account only 
the quantity of the citations received by articles pub- 
lished in a given journal during last two years but not 
their quality. Presented approach allows us to take into 
consideration the fact, who quoted the papers analyzed. 
In full analogy to (JHJ we thus define the weighted impact 
factor (WIF2) of a journal J, 



WIF2 y (J) 



f 



Zy— 2 + Zy—l 



Zy — 2-\-Zy—\ 

E 

i=i 



w(Aj) . (7) 



The only difference is that instead of counting the bare 
numbers of citation c(Aj) of a given article Aj, we now 
sum the weighted citations w(Aj). Since these number 
reflects in a sense the quality of a citation we tend to 
believe that the weighted impact factor forms a more 
accurate quantity to evaluate the quality of a scientific 
journal than the standard IF. 

As mentioned in the introduction in some disciplines 
like mathematics and mathematical physics the process 
of preparing an article and publishing it is often longer 
than the two year time span used in the definition of 
IF2. Therefore one may propose 0] to use also similar 
quantities defined for a longer time window containing 
five years. In full analogy to the previous definitions we 
write 

^ m 5 

IF5„(J) := -E c (^) where ™ = Y,Zy-i (8) 



i=i 



and 



WIF5 y (J) 



^ m 



(9) 



Here one takes into account all articles published in the 
five year time span, while c(Aj) and w(Aj) denote now 
the number of all citations and the sum of weighted cita- 
tions, a given paper Aj from this sample obtained in the 
analyzed year y. The 5-year impact factor could be spe- 
cially useful to characterize mathematical journals and 
journals devoted to these fields of science, in which the 
papers are produced in a slower pace, and the citations 
half-life is longer, since after a few years the articles do 
not become obsolete. 



C. Weighted impact factor of a paper 

Since the distribution of citations is known to be skew 
0, EHH the providing the average number of citations 



only is by far not sufficient to characterize the entire dis- 
tribution. Hence it is not possible to use the impact fac- 
tor of a journal as an estimated, number of the citation a 
typical paper published there will obtain during the next 
two years. Moreover, es explicitly emphasized by Seglen 
[22j . the number of citation obtained by a given article 
is not influenced by the impact factor of the journal, in 
which it appeared. 

To make any reasonable evaluation of the impact a 
given article had on the scientific community, one can 
analyze its contribution to the impact factor of the jour- 
nal. To this end we define the impact factor of an article 
A published in year y is a sum of citations gained in the 
next two years, 



AlF2 y (A) 



Cy+\{A) + c y+2 {A) 



(10) 



since only this citations contribute to the impact factor 
IF2. Here c y (A) denotes the number of times the paper 
A was quoted during year y. Note that the article impact 
factor (AIF2) can be only defined only for articles pub- 
lished more than two years ago. In view of the statistical 
properties of the citation distribution it is clear therefore 
that for any paper older then two years this very quan- 
tity has to be used to describe its impact on the field, 
instead of the IF of the journal it was published. In a 
similar manner, for papers older than five years one can 
also define the five years impact factor (AIF5). 

To take also into account the quality of each citation 
we can use of the weights w y introduced in the previ- 
ous section and define the weighted article impact factor 
(WAIF2) 



WAIF2 y (A) 



w y+ i(A) + w y+2 {A) 



(11) 



where w y (A) denotes the sum of the weighted citations 
the paper A defined as in (J5J) for citations gained during 
the year y. By construction this notion is applicable to 
articles published at least two years earlier. 



D. Weighted Hirsch index 

To quantify a scientific research output of a given re- 
searcher one often uses the h index introduced by Hirsch 
Q. For a given scientists this index is equal to h, if ft. of 
all papers he has written were quoted at least h times. 
Ordering his articles according to the number of citations 
c(Ai), the article Ai has ever received one can write 



h := maxfc : c(Ak) > k . 



(12) 



Although the h index gained considerable popularity 
and it became a subject of several research papers uSj 
I24H2IH . several of its drawbacks were emphasized @,TJ. 
As in the case of the impact factor the Hirsch index is not 
capable to differentiate between relevant and less relevant 
citations. 
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Making use of the weighted citation number w(Ai) of 
an article, we may thus introduce the weighted h index 

w := w(A k ) (13) 

where k is the maximal integer such that w(A k ) > k. 
This index enjoys all the virtues of the original h in- 
dex recently emphasized in [H, [25[ , but additionally it 
takes into account scientific achievements of the authors 
quoting the work of the evaluated individual. Since the 
weights w(Ai) determined by the citation graph take into 
account the number of authors of the paper jthe weighted 
index w does not suffer a crucial drawback [3j of the orig- 
inal Hirsch index, in which a paper with a hundred co- 
authors is treated in the same way as an article written 
by a single scientist. Furthermore, the weights w are in 
general given by real numbers, so the index w may admit 
non-integer numbers. Thus this quantity provides us a 
finer differentiation of the group analyzed than the index 
h, which is integer by definition. 

A possibility to use the /i- index to quantify scientific 
production of an entire institution was recently advo- 
cated in [27|. Hence one can use the weighted index w 
for this purpose. Furthermore, following Schubert [28| 
one can easily adopt his idea of successive performance 
indices and define an analogue to the index hi- To be 
concrete, the weighted successive index W2 of a scientific 
institution is equal to an integer number w, if it employs 
w scientists, such that the weighted Hirsch index w^> for 
each of them is equal to or larger than W2- 



In such a way various citation habits, different in differ- 
ent fields of science are automatically taken into account. 
Moreover, the role of a single citation of a review paper 
seems to be adequate, since a good review may attract 
a lot of citations, but its list of references is usually also 
long. 

In a loose analogy to the Hirsch index, we can define an 
efficiency index, (e-indcx), which quantifies the research 
output of a given researcher, 

e := maxfc : E{A k ) > 1 . (15) 

In other words, for a given author we count the num- 
ber of his scientific papers, which belong to the class of 
'known papers' - they have gained more citations, than 
the number of items in the list of references in this article. 

In spirit of this work we may improve this quantity and 
define the weighted efficiency index 

e' := max A; : w(A k ) > r(A k ) . (16) 

Now we count the number of articles for which the 
weighted number of citations w is larger or equal to the 
total number of references r. Such an index is perhaps 
not as sophisticated as the Hirsch index, but its values 
are by construction less dependent on the working habits 
in a given scientific discipline. 

IV. CONCLUDING REMARKS 



E. Weighted efficiency index 

Let us emphasize here that one should not directly 
compare the Hirsch indices for scientists working in dif- 
ferent research fields. This is due to the fact that the 
numbers of papers and citations vary from one scientific 
field to another 0, Htl H(| , so the means values of Hirsch 
indices also do differ. On one hand one could compare the 
values of the indices rescaled against the average value 
in a given field [23| . On the other hand, one may work 
with other indicators which reflect citing patterns of each 
community. 

As an example of such a quantity one consider the 
number of 'known papers' produced by a given re- 
searcher. Defining the known paper as an article quoted 
more times then the number of references cited in it, we 
see that this notion by construction takes into account 
the citation habits of a given field. 

To set a simple normalization scale useful for compari- 
son of citations gained by articles from various disciplines 
Plomp Q introduced the efficiency of a given paper. It 
is defined by a ratio, 



E(A) = 



c(A) 
r{A) 



(14) 



Analyzing the entire citation graph and citation ma- 
trix one can obtain a weighting factors which quantify the 
total impact of a single researcher for the scientific litera- 
ture. It will be interesting to analyze statistical distribu- 
tion of weighting factors for the citation graph represent- 
ing the entire scientific literature and certain particular 
branches of science. An empirical study of papers on high 
energy physics [U, [Hf and computer science Q reveals 
that the probability P(k) that a given article is cited k 
times, decays according to a power law, P(k) ~ k~ a . A 
power law distribution of the weighting factors attributed 
to individual scientific papers on molecular biology and 
biochemistry was recently reported by Ma et al. 11 



where c(A) denotes the number of citations gained, while 
r{A) is equal to the number of articles quoted in work A. 



Thus one could verify, whether a similar behavior will be 
observed for the distribution of weighting factors charac- 
terizing the group of scientists working in a given field. 

The weighting factors attributed to a given scientist are 
useful to introduce further bibliomctric quantities. For a 
given article one defines its weighted number of citations, 
for a journal its weighted impact factor and for a given 
scientists the w index, i.e. his weighted Hirsch index. 

Analogous quantities can be introduced for groups of 
researchers or entire scientific institutions, but their nor- 
malization and interpretation has to be performed with 
a certain caution |27| . Similarly, cumulative w indices 
can be used for various scientific fields and sub-fields just 
to identify so called 'hot topics' yTli . The usage of the 
weighted indices in all these cases could be superior with 
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respect to the standard quantities in a sense that the ap- 
proach proposed takes into account the average quality 
of the citations of scientific articles. 

However, it should be emphasized explicitly that the 
computation of weighted scientometric indices is it not 
entirely straightforward and for any practical purpose 
one needs to cope with several technical problems. For 
instance one has to deal with different authors with iden- 
tical names, with scientists who change their name during 
their career and with scientists whose name was tran- 
scribed into Latin in several different ways. In general 
one might think that such cases do not occur very often 
HH , so they should not induce statistically significant ef- 
fects for the weighted indices of all other authors, but 
these important problems definitely require further stud- 
ies. Some remarks on selection of bibliometric data and 
other practical issues are provided in Appendix B. 

We shall now pass to some more general remarks. Al- 
though we tend to agree that the existing bibliometric 
indices can be further developed and improved, we do 
not claim there exists a single number capable to quan- 
tify scientific achievement in an unambiguous way. On 
the other hand, one should not neglect the possibility of 
making a wise use of the bibliometric data and various 
impact factors. Let us quote however, an opinion of Adler 
ct al. @, "While it is incorrect to say that the impact 
factor gives no information about individual papers in a 
journal, the information is surprisingly vague and can be 
dramatically misleading" . 

Similarly, any bibliometric data should not play the de- 
cisive role during any peer review process. For instance, 
working with applications for Advanced Grants of Eu- 
ropean Research Council (ERC) the panels of experts 
tried hard to evaluate the quality of the projects and 
the scientific achievements of the principal investigator, 
not putting too much attention to their scientometric in- 
dices. However, an a posteriori statistical analysis found 
a clear correlation [35[ between the final outcome of the 
2008 grant competition in PE-2 panel and the bibliomet- 
ric benchmark suggested by ERC and used in the propos- 
als: the total number of citations of ten papers chosen 
by each applicant from his list of publications for the last 
decade. 

Let us then conclude this article with some concrete 
comments concerning the practical usage of scientometric 
data. They will be separately addressed to three groups 
of readers. 

a) Scientists. Do well your research, write good pa- 
pers and try to publish them in good scientific journals. 
Writing your articles cite these papers which should be 
quoted, according to the established habits in your field. 
Do not care too much about various impact factors and 
indices. Any good scientist will have sound numbers with 
respect to any (reasonable) measure and scientometric in- 
dicator. Do not waste your time and energy for a silly 
game to inflate artificially the values of the scientomet- 
ric indices, which might be used to characterize your re- 
search output. 



b) Reviewers. Scientists involved in all kind of eval- 
uation should make use of their knowledge of the field 
and do not treat the bibliometric data as a definite an- 
swer to any question. During the peer review process 
all scientometric indicators should be considered as aux- 
iliary data only. In a need to characterize the impact 
of a given article published more then three years ago 
one should use the number of citations gained instead of 
the impact factor of the journal it was published. Fur- 
thermore, the bibliometric indicators should always be 
normalized against the average computed for scientists 
working in the similar field of science and in the corre- 
sponding period of time. 

c) Managers of science. Scientific activity has mul- 
tiple goals, so try to avoid harsh consequences of the pro- 
jection of a multidimensional system onto a single axis. 
Do not hope therefore for a unique scientometric indica- 
tor, which could be widely used as a universal evaluation 
tool. Each bibliometric index has certain advantages and 
some drawbacks, but using several of them in parallel re- 
duces the risk of manipulating the data. Support versa- 
tile usage of scientometry, in which the researcher under 
evaluation takes active part. For instance, consider the 
benchmarks used by applicants for the ERC grants: Any 
senior researcher selects his ten best papers published in 
the last decade and provides the number each of them 
was cited. A junior scientist has to choose his best five 
papers published during the recent five years. 

To summarize, it is not fair to say that the bibliometric 
data carry no valuable information whatsoever. However, 
it is not as simple to decode from them a piece of relevant 
information, as it may look like at a first glance. Thus 
we would not to discourage from usage of scientometric 
data, provided they are used in a wise and reasonable 
way. 

Note added. After the first version of this work was 
completed a new paper by Radicchi et al. was posted 
in the web and later published [36|. The authors of this 
article put forward a similar idea to apply the PageR- 
ank algorithm to the citation graph, in which each ver- 
tex represents an individual author. Working with the 
set of data composed of the collection of the Physical 
Review journals published between 1893 and 2006 they 
concluded that the numerical values of the weighted indi- 
cator obtained in this way for several physicists correlates 
well with their scientific achievements measured by some 
of the main prizes in physics, which include Nobel prize, 
Boltzmann medal, Wolf prize, Dirac medal and Planck 
medal. 

It is a pleasure to thank P. Bialas, W. Burkot, 
G. Harahczyk, M. Kus and W. Slomczyhski for helpful 
discussions and C. M. Bender for fruitful correspondence. 
This work was performed during the author's spare time 
and was not supported by any funding agency. 
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Appendix A: Some exemplary graph matrices and 
their leading eigenvectors 

In this appendix we provide examples of some simple 
matrices and analyze properties of their leading eigenvec- 
tor. Although a matrix of a small size N directly repre- 
sents only a small citation graph which describes a small 
group of N scientists, it can be also applied to model a 
huge graph with a sub-graph structure: each vertex may 
represent a given field or subfield of science. Therefore 
studying even such oversimplified cases can be helpful in 
understanding the properties of the connectivity matrix 
of a citation graph and its leading eigenvector. 

Let us start with the simplest case of TV = 2, 



Co 



a 
b 



Vb 



(Al) 



The leading eigenvalue reads A = V ab, and in this case 
the weights x% given be the corresponding eigenvector are 
proportional to the square root of the flow between the 
vertexes. Obviously this is not longer the case for larger 
graphs, 



C, = 



a 
6 
c 



a 2/3 6 l/3 
6 2/3 c l/3 
a l/3 c 2/3 



(A2) 



The following numerical example shows that the weights 
given by the leading eigenvector grow slower than linearly 
with the average entry in each row, 
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0.3223 
0.5738 
0.7755 
0.9409 



0.6325 
0.7746 







(A3) 



(A4) 



Observe that quotations by authors, the papers of 
which were never cited do not contribute at all to the 
weighting index! 
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(A5) 



Similarly, quotation by a junior scientist, the papers of 
which received a little attention of the scientific commu- 
nity, are much less important than a citation by an ac- 
complished author. This is seen by comparing the third 
and the fourth component of the eigenvector of the above 
citation matrix, in which the first two rows represent a 



rcnowed researcher and a less experienced author, respec- 
tively. 

It is illustrative to analyze the case of two weakly con- 
nected subgraphs, represented below by the first and the 
second pair of nodes. If the coupling between the sub- 
graphs is symmetric, C2.4 = 64,2 the leading eigenvector 
lives in both subspaces, 



C = 
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6 1 

4 

16 



0.4197 
0.5691 
0.4197 
0.5691 



(A6) 



However, if there is more fluxes between both subgraphs 
start to differentiate, the weight of the leading vector 
moves toward the distinguished subsystem, 



C = 
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(A7) 



(A8) 



If two graphs are not connected, the leading eigenvalue 
is degenerated and one finds a corresponding eigenvector 
localized exclusively in the more populated subspace, 



C 



4 

6 
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3 



0.6325 
0.7746 







(A9) 



To lift such a degeneracy one may modify the analyzed 
matrix C by forming its convex combination with the flat 
matrix S such that $y = 1 /N. In this way one assures @ 
that the leading eigenvector of C(p) = (1 — p)C + pS can 
be obtained by iterating sufficiently long the flat vector 
with all entries equal, Wi = l/N, by the matrix C(p). 



Appendix B: Practical remarks on evaluating the 
weighting vector 

1. Selection of the data 

The key issue by constructing the citation graph is an 
access to a reliable data base containing the scientific lit- 
erature. For instance one may rely on the data provided 
by the ISI Web of Science, although some experts claim 
that it is biased toward the scientific journals published 
in English only and it does not cover uniformly the en- 
tire literature. Alternatively one may chose to use some 
publicly open web search engines, like Google Scholar. 
In this case it is believed that Google does not cover sys- 
tematically earlier scientific literature. Furthermore it 
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is not clear how to set simple criteria, which web docu- the average, 
merits should be taken into account. On one hand one 
might restrict the attention to the papers published by a 
scientific journal, which is first found in an earlier com- 
piled list of all sources accepted. On the other hand, 
due to popularity of various web archives and preprint 
depositories (like arxiv.org) one might also accept for- 
mally unpublished preprints posted there. In such a case 
a special care has to be taken in order to avoid double 
counting the same article, first deposited in an archive, 
and later published in a journal, often under a slightly 
changed title. 



2. Different fields of science 

As illustrated with some simple matrix examples, if 
two fields of science are not coupled by any cross- 
citations, the leading vector describes only scientists 
working in the larger field. Similarly, if two fields of sci- 
ence are coupled only weakly by a few cross-citations, the 
leading eigenvector tends to be localized in the subgraph 
with more scientists, papers and citations, so the weight- 
ing factors handicap researchers working in a less popular 
subficld. The splitting of the entire graph into subgraphs 
can be defined in an objective way by applying the recent 
method of Newman [37| to find community structure in 
the citation graph. Since it is well known that the cita- 
tion patterns depend on the branch of science [29|, [3(| , 
one should rather analyze two subgraphs separately, or 
renormalize the leading eigenvector separately for a given 
subficld. This is consistent with a rather general 'rule 
of thumb': the bibliometric data should be normalized 
against the average computed for scientists working in 
the similar field of science in the corresponding window 
of time IH. 



the diagonal, C = PCP 1 



where D\ and 



3. Degeneracy in names 

It might not be easy to distinguish papers written by 
various scientist, who publish under the very same name 
[lH . In principle one may try to distinguish them by the 
scientific discipline, the affiliations and the time window 
of their publishing activity, but is its unlikely to expect 
that the success rate will tend to unity. On the other 
hand, it is reasonable to conjecture that not distinguish- 
ing between the scientists with the same name will not 
impact much the weighting indices of all other researchers 
in the graph, as the weights of the links will be taken as [381 [3 



4. Period of the scientific activity 

It would be unwise to compare weighting indices of 
two researchers in very different age or living in differ- 
ent times. The number of universities, scientists, jour- 
nals, papers and citations keeps growing fast. Hence one 
should expect that a comparison of two scientists with 
equally valuable accomplishments, whose scientific con- 
tributions are already forgotten (and their papers are not 
quoted any more), would reveal that the scientist active 
more recently is characterized by a larger weighting fac- 
tor. 



5. Uniqueness of the leading eigenvector of the 
citation matrix 

A matrix C is called reducible if it can be transformed 
by a permutation P into matrix with a zero block below 

D x Z 
D 2 

D2 are square matrices. In the opposite case the matrix 
is called irreducible. The Frobcnius-Pcrron theorem im- 
plies that for any irreducible non-negative matrix C its 
spectral gap is positive, 7 := z\ — \z 2 \ > 0, so the real 
eigenvector x corresponding to the leading eigenvalue z\ 
is unique. The size of the spectral gap governs the speed 
of the convergence of any initial vector iterated several 
times by C to the invariant state x = Cx. 

The initial citation matrix C analyzed in this paper 
in principle could reducible, but due to numerous cross- 
citations between various researchers and subfields this 
possibility seems to be unlikely. Furthermore, the auxil- 
iary (N + l)-th node of the graph representing all scien- 
tists outside the ensemble under investigation introduces 
additional connectivity and hence increases (on average) 
the spectral gap. 

The size of the spectral gap for the graph matrix de- 
scribing entire scientific literature has to be determined 
in a numerical experiment. If the gap occurs to be too 
small to ensure convergence time realistic for practical 
implementations, one may always introduce a suitable 
modification of the citation matrix C . For instance, fol- 
lowing the original idea of Page Rank Q , one could mix 



C with the flat matrix S such that S%j = l/N - see also 
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